Conversation
|
I think I've finally got what I want out of this now. |
|
Whoops. ⏰ Hopefully fixed. |
|
Evidence for this fixing the original problem in #3357 ... "After" examples : |
|
Thanks! I confirm that this fixes the issue #3357. |
lib/iris/_lazy_data.py
Outdated
| from __future__ import (absolute_import, division, print_function) | ||
| from six.moves import (filter, input, map, range, zip) # noqa | ||
|
|
||
| from collections import Iterable |
There was a problem hiding this comment.
Currently in Python3.7, you get the following DeprecationWarning:
>>> from collections import Iterable
__main__:1: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop workingCould you adopt the following pattern:
try: # Python 3
from collections.abc import Iterable
except ImportError: # Python 2
from collections import Iterable
lbdreyer
left a comment
There was a problem hiding this comment.
Overall looks good!
Just a few questions...
lib/iris/_lazy_data.py
Outdated
| # Fetch the default 'optimal' chunksize from the dask config. | ||
| limit = dask.config.get('array.chunk-size') | ||
| # Convert to bytes | ||
| limit = da.core.parse_bytes(limit) |
There was a problem hiding this comment.
Why did you chose to get parse_bytes from da.core rather than from dask.utils?
There was a problem hiding this comment.
I think I copied this from the dask sourcecode somewhere.
I will fix it.
lib/iris/_lazy_data.py
Outdated
|
|
||
| # Create result chunks, starting with a copy of the input. | ||
| result = list(chunks) | ||
| if shape is None: |
There was a problem hiding this comment.
When would shape be None? I don't think we should be allowing for shape=None. you also iterate through shape on line 105
There was a problem hiding this comment.
I think I was trying to mimic the API of dask.array.core.normalize_chunks, in case we can use that in the future.
Actually I haven't achieved that, and we never use this option, so I will remove it.
lib/iris/fileformats/netcdf.py
Outdated
| fill_value) | ||
| chunks = cf_var.cf_data.chunking() | ||
| # Chunks can be an iterable, None, or `'contiguous'`. | ||
| # Chunks can be an iterable, or `'contiguous'`. |
There was a problem hiding this comment.
I don't understand this change. You have removed None and yet two lines down it sets
chunks = None
There was a problem hiding this comment.
There are two different sets of conventions here.
- The 'chunks' value that nc.Variable.data_chunking returns can not be None, I believe.
- I don't know quite why it ever said that : it just seems wrong to me.
- The 'chunks' keyword we pass to as_lazy_data can be None. And it absolutely can't be 'contiguous', which is why we are converting here.
I will try to clarify in the comments.
lib/iris/_lazy_data.py
Outdated
| # Return chunks unchanged, for types of invocation we don't comprehend. | ||
| if (any(elem <= 0 for elem in shape) or | ||
| not isinstance(chunks, Iterable) or | ||
| len(chunks) != len(shape)): |
There was a problem hiding this comment.
You don't have explicit tests for this check.
I'm not sure how thorough we want to be with testing this. It does seem like a bit of overkill to add tests for
_optimum_chunksize((200,300), (1,200,300))
There was a problem hiding this comment.
This was an attempt to allow alternative, dask-type chunks arguments in 'as_lazy_data'.
Obviously we don't use anything like that at present. The only immediate need is to skip shapes with a zero in them (see comment).
I now see this is wrong anyway, as that initial test clause assumes that shape is iterable !
I will simplify to just what we need, and add a testcase.
There was a problem hiding this comment.
... after some thought, I have removed this to the caller + documented the behaviour there.
|
Attempted to address review comments. Note : I didn't make any changes to address the comment about the size of divided chunks. |
| limitcall_patch = self.patch('iris._lazy_data._optimum_chunksize') | ||
| test_shape = (2, 1, 0, 2) | ||
| data = self._dummydata(test_shape) | ||
| result = as_lazy_data(data, chunks=test_shape) |
There was a problem hiding this comment.
F841 local variable 'result' is assigned to but never used
|
Thanks @lbdreyer this fixes the outstanding errors. I'm happy that those other uses in testing of as_lazy_data(.. chunks=X ..) are all now unnecessary, and we don't need to support any more complex usages : We accept that this 'chunks' keyword is not much like the dask one, and that is ok : the new docstring reflects this I hope. Regarding the "lost" test, |
|
Meanwhile, though ... I'm still wondering about your comment "This does end with an array that is unequally split into chunks" So far : a good practical example ... (given the current default dask chunksize, which is 128 Mb) To be continued ... |
|
Interested ? @cpelley |
|
Hi @lbdreyer |
|
Hi again @lbdreyer |
|
This is a really great change! 💯 Thanks for persisting with it @pp-mo! |
Revised chunking policy, which will mostly affect loading from netcdf files.
Key change : allows multiplying up as well as dividing chunks, mainly to fix the #3357.
Closes #3357 #3362